Estimating Uncertainty of Categorical Web Data
نویسندگان
چکیده
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as first or second order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using the Dirichlet Process.
منابع مشابه
Uncertainty Estimation and Analysis of Categorical Web Data
Web data often manifest high levels of uncertainty. We focus on categorical Web data and we represent these uncertainty levels as firstor second-order uncertainty. By means of concrete examples, we show how to quantify and handle these uncertainties using the BetaBinomial and the Dirichlet-Multinomial models, as well as how take into account possibly unseen categories in our samples by using th...
متن کاملA Comparative Study of Performance of Adaptive Web Sampling and General Inverse Adaptive Sampling in Estimating Olive Production in Iran
Nowadays, there is an increasing use of sampling methods in network and spatial populations. Although the most common link-tracing designs such as adaptive cluster sampling and snowball sampling have advantages over conventional sampling designs such as simple random sampling and cluster sampling, these designs still present many drawbacks. Adaptive web sampling is a new link-tracing design tha...
متن کاملSSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory
In the present day scenario, there are large numbers of clustering algorithms available to group objects having similar characteristics. But the implementations of many of those algorithms are challenging when dealing with categorical data. While some of the algorithms available at present cannot handle categorical data the others are unable to handle uncertainty. Many of them have the stabilit...
متن کاملCategorical models for spatial data uncertainty
Considerable disparity exists between the current state of the art for categorical spatial data error modeling and the current state of the practice for reporting categorical data quality. On one hand, the general Monte Carlo simulation-based error propagation framework is a fixture in spatial data error handling; researchers have identified potentially powerful approaches to characterizing cat...
متن کاملMMR: An algorithm for clustering categorical data using Rough Set Theory
A variety of cluster analysis techniques exist to group objects having similar characteristics. However, the implementation of many of these techniques is challenging due to the fact that much of the data contained in today’s databases is categorical in nature. While there have been recent advances in algorithms for clustering categorical data, some are unable to handle uncertainty in the clust...
متن کامل